Cross-Lingual Sentiment Analysis with Machine Translation
نویسندگان
چکیده
Recent advancements in machine translation foster an interest of its use in sentiment analysis. This thesis investigates prospects and limitations of using machine translation in cross-lingual sentiment analysis. To perform a sentiment analysis we need to learn linguistic features by either using tools such as part-of-speech taggers, parsers, or basic resources such as annotated corpora or sentiment lexica. We are motivated to study the translation of existing resources in English simply because building such tools and resources for each language requires considerable human effort. This severely limits the implementation of language specific sentiment analysis techniques similar to those developed for English. Labeled corpora and sentiment lexica are two main resources in the application of sentiment analysis. We translate them to a language with limited resources where we opt to focus on improving classification accuracy when (labeled or raw) training instances are available. In some cases, however, we may not have access to any training data. To address this scenario we explore methods to translate sentiment lexica to a target language as we also try to improve machine translation performance by generating additional context. For all experiments we work on English and Turkish data which consist of movie and product reviews and we perform two-class (positive-negative) classification -polarity detection in which we discard the neutral class. Consequently, we obtain promising results in polarity detection experiments where we use general-purpose classifiers trained on translated corpora while in this point we remark that dissimilarities between two corpora in different languages should be further studied for better integration of resources. We also find quantitative evidences to suggest that lexica translation is more troublesome since the inherit differences of expressing sentiment between two languages make it harder to preserve the sentiment of words/phrases when translating them from one language to another.
منابع مشابه
Cross-Lingual Sentiment Analysis Without (Good) Translation
Current approaches to cross-lingual sentiment analysis try to leverage the wealth of labeled English data using bilingual lexicons, bilingual vector space embeddings, or machine translation systems. Here we show that it is possible to use a single linear transformation, with as few as 2000 word pairs, to capture fine-grained sentiment relationships between words in a cross-lingual setting. We a...
متن کاملCo-Training for Cross-Lingual Sentiment Classification
The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. Machine tr...
متن کاملCross-lingual sentiment classification: Similarity discovery plus training data adjustment
The performance of cross-lingual sentiment classification is sharply limited by the language gap, which means that each language has its own ways to express sentiments. Many methods have been designed to transmit sentiment information across languages by making use of machine translation, parallel corpora, auxiliary unlabeled samples and other resources. In this paper, a new approach is propose...
متن کاملExploring Distributional Representations and Machine Translation for Aspect-based Cross-lingual Sentiment Classification
Cross-lingual sentiment classification (CLSC) seeks to use resources from a source language in order to detect sentiment and classify text in a target language. Almost all research into CLSC has been carried out at sentence and document level, although this level of granularity is often less useful. This paper explores methods for performing aspect-based cross-lingual sentiment classification (...
متن کاملCombination of Multi-view Multi-source Language Classifiers for Cross-Lingual Sentiment Classification
Cross-lingual sentiment classification aims to conduct sentiment classification in a target language using labeled sentiment data in a source language. Most existing research works rely on machine translation to directly project information from one language to another. But cross-lingual classifiers always cannot learn all characteristics of target language data by using only translated data fr...
متن کاملThe Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
Expensive feature engineering based on WordNet senses has been shown to be useful for document level sentiment classification. A plausible reason for such a performance improvement is the reduction in data sparsity. However, such a reduction could be achieved with a lesser effort through the means of syntagma based word clustering. In this paper, the problem of data sparsity in sentiment analys...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013